Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change lakeFS package to lakefs-sdk in place of lakefs-client #107

Merged
merged 11 commits into from
Oct 24, 2023

Conversation

nicholasjng
Copy link
Collaborator

In preparation for the lakeFS v1 launch, change the package to depend on their newly launched lakefs-sdk to keep up to date with their upstream changes.

The import change is largely a name replacement only, with the exception of the model imports now happening from the lakefs_sdk.models subpackage.

@nicholasjng nicholasjng marked this pull request as draft October 16, 2023 12:57
@nicholasjng nicholasjng requested a review from AdrianoKF October 16, 2023 12:58
@nicholasjng
Copy link
Collaborator Author

So I assume the errors mean that I now need to read() the file contents on my own in objects_api.upload_object instead of handing off the file descriptor?

@AdrianoKF
Copy link
Contributor

So I assume the errors mean that I now need to read() the file contents on my own in objects_api.upload_object instead of handing off the file descriptor?

Yup, seems so. upload_object now takes a content: bytearray argument: https://github.com/treeverse/lakeFS/blob/master/clients/python/docs/ObjectsApi.md#upload_object

@nicholasjng
Copy link
Collaborator Author

This is an upstream issue, tracked in treeverse/lakeFS#6795.

In preparation for the lakeFS v1 launch, change the package to depend
on their newly launched `lakefs-sdk` to keep up to date with their
upstream changes.
This is largely a name replacement only, with the exception of the model
imports now happening from the `lakefs_sdk.models` subpackage.
Mostly a name-only replacement.
Otherwise we end up with a dependency conflict in CI.
This change comes from the move to pydantic, which requires keyword
arguments for its initializations.
This was changed in the lakeFS SDK from the buffered reader used before.

Now, we need to fetch byte ranges, which we were doing in the
`LakeFSFile` already, with every `objects_api.get_object` call.

The main difference, however, is that we cannot just pass increments of
the file system's chunk size, because that will create HTTP416 errors
(request range not satisfiable).

Instead, first fetch the object size from the backend via `fs.info()`,
then guard against the overflow by max'ing it with the next increment
of the chunksize.

We also change a test to purposefully require multiple `get_object`
range requests to see if the logic works as intended.
@nicholasjng nicholasjng marked this pull request as ready for review October 24, 2023 11:09
@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (15aba20) 88.84% compared to head (5f77978) 87.15%.
Report is 2 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
- Coverage   88.84%   87.15%   -1.69%     
==========================================
  Files           7        7              
  Lines         466      467       +1     
  Branches       61       62       +1     
==========================================
- Hits          414      407       -7     
- Misses         29       35       +6     
- Partials       23       25       +2     
Files Coverage Δ
src/lakefs_spec/__init__.py 77.77% <100.00%> (ø)
src/lakefs_spec/client_helpers.py 92.10% <100.00%> (-0.40%) ⬇️
src/lakefs_spec/errors.py 84.21% <100.00%> (ø)
src/lakefs_spec/hooks.py 95.23% <100.00%> (ø)
src/lakefs_spec/util.py 76.47% <100.00%> (ø)
src/lakefs_spec/spec.py 85.48% <85.71%> (-2.41%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pyproject.toml Outdated Show resolved Hide resolved
@AdrianoKF
Copy link
Contributor

AdrianoKF commented Oct 24, 2023

The demo notebook still imports from the old lakefs_client package, but it seems only used for a type hint. Can you update it, please?

Edit: same goes for the readme and uses cases documentation

@nicholasjng nicholasjng merged commit 8d49758 into main Oct 24, 2023
3 checks passed
@nicholasjng nicholasjng deleted the lakefs-sdk branch October 24, 2023 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants